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ABSTRACT 



The problem addressed in the study is a comparison of three 
methods '-'f factor-analyzing dichotomously-scored item response 
data. This comparison was accomplished by using phi and 
tetrachoric correlations among dichotomous data; and Pearson 
product -moment correlations among Rasch probaUsility estimates of 
the same dichotomous data in factor analysis. The Rasch approach, 
as a psychometric measurement model, was chosen because it met the 
assumption of a linear ability continuum underlying dichotomous 
item response data. Results indicated the superiority of the 
Rasch-based technique for factor-analyzing dichotomouply-scored 
item response data. 
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Rasch-based Factor Analysis 
of 

Dichotomously-scored Item Response Data 

INTRODDCTION 

In the behavioral and cognitive sciences, factor analysis 
often involves test item responses ass\axning only two values, either 
zero or one (Lawley, 1943; Mulaik, 1972; Cureton & D'Agostino, 
1983; Gerbing, 1989; Goldstein & Wood, 1989) . Previous attempts to 
solve the estimation problem in the factor analysis of such dicho- 
tomous data have focused on the selection of an appropriate measure 
of correlation, r, 0, or r^.^, as estimates of unobservabie 
correlations in the population. Pearson product -moment 

correlations (r) however are intended for use with continuous, 
interval -level data; phi coefficients {^) with dichotomous data; 
and tetrachoric coefficients {r^^t) with dichotomous data that are 
assumed to have underlying continuity. 
(Hinkle, Wiersma, & Jurs, 1988) . 

Mislevy (1986, pp. 9-10) listed three shortcomings of <j»: 

(1) Values of (j) are dependent net only upon the strength of 
the relationship between variables, but also upon the 
difference between their mean values. The phi coefficient 
can attain extreme values of -1 or +1 only when the two 
correlated varieObles have equal means. 

(2) The expression for <}) is generally augmented by terras that 
depend on the skewness of the discrete variables, which in 
the dichotomous case, is a function of their mean values. 

(3) When binary variables are the result of dichotomizing 
continuous variables, the placement of tho cutting point 
directly affects the value of the expected ^ coefficients. 
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Rasch-based Factor Analysis 2 
The tetrachoric correlation coefficient (r^.^.) has also been 
utilized for dichotomous variables possessing underlying continuous 
normal distributions (Kachigan, 1986, p. 211/ Lord, 1980, p. 39) . 
Several authors, including Bock and Lieberman (1970) , Crocker and 
Algina (1986, p. 320), and Jdreskog and Sorbem (1986, p. IV. 3) have 
recommended r^.^ rather than ^ for factor analyzing dichotomous 
data, although Lord (1980, p. 21) asserted that " jareskog' s maximxam 
likelihood factor analysis and accompanying significance tests are 
not strictly applicable to tetrachoric correlation matrices". 

Usage and interpretation problems however hinder the use of 
r^.t- Lord (1980, p. 21) warned that "tetrachoric correlations 
cannot usually be strictly justified" when verifying the 
unidimensionality of a set of test items, because tetrachoric 
correlations are "inappropriate for non-normal distributions of 
ability; they are also inappropriate when the item response 
function is not a normal ogive [and] whenever there is guessing" 
(ibid., p. 20). Muth^n (1989) noted that: 

(1) rt,t matrices are not assured positive def initeness, which 
may indicate a violation of the underlying normality 
assumption, or which may reflect sampling variability (p. 24) . 

(2) rt,t matrices generally yield extremely inflated chi-square 
values and underestimated standard errors of -estimate, as 
compared with Pearson r matrices (p. 24) . 

(3) The asstomption of underlying normality is questionable 
when the mean value of a dichotomous variable departs 
appreciably from .5 (p. 27) . 
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Rasch-based Factor Analysis 3 



In addition, Guilford and Fruchter (1973, pp. 300-306) 
observed that: 

(4) rt,t is less reliable than r, since it is at least 50 
percent more variable. 

(5) Large samples (N = 200 or 300) are required when r^^t is 
used to estimate the degree of correlation in the population, 
although sample sizes of about 100 can be used to test the 
null hypothesis of zero population correlation. 

<6) rt.,t should be avoided when extreme skewness is present, as 
when p or q = .90 or so, because the stemdard error is very 
large in such cases . 

(7) rt,t must be avoided when only one cell in the pq by p'q' 
matrix is empty, or when one cell exhibits a much smaller 
frequency than the other three. In general, the distribution 
should be fairly symmetrical alonr one matrix diagonal or the 
other - 



Kim and Mueller (1978, pp. 74-75) offered the following 
general proscriptions against the factor analysis of dichotomous 
variables : 



"Nothing can justify the use of factor analysis on dichotomous 
data except a purely heuristic set of criteria. . . . Even in 
dichotomies, the use of phi's can be justified if factor 
analysis is used as a means of finding general clusterings of 
varieibles and if the underlying correlations among variables 
are believed to be moderate — say less than .6 or .7. . . . If 
the researcher's goal is to search for clustering patterns, 
the use of factor analysis may be appropriate. . . . One way 
of doing this is to use tetrachoric correlations instead of 
phi's- This approach is only heuristic because the 
calculation of tetrachorics can often break down and the 
correlation matrix may not be Gramian" . 



A rigorous investigation of dichotomous test data however was 
pursued by researchers who were interested in much more than "a 
purely heuristic" search for "general clusterings of variables". 
Statistical techniques that ensibled simultaneous investigations of 
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Rasch-based Factor Analysis 4 
three or more variables were used, of which at least one was often 
unobserved or •'latent" (Loehlin, 1987) • These techniques were 
collectively labeled by Bentler (1980) as "latent variable 
analysis" . 

Other authors have also recommended various approaches, 
Christof ferson (1975) described an approach to the factor analysis 
of dichotomized variables based on the distribution of the first - 
and second-order joint probabilities of item values using a 
generalized least-squares estimation procedure, with r^^t matrices. 
Wise and Tatsuoka <1986) used order analysis^ modified to include 
item proximity information, to identify the dimensionality of 
dichotomous data. Stage (1988) analyzed a dichotomous criterion 
variable using stepwise logistic regression in conjunction with 
LXSREL output- The LISREL measurement model produced 

unstandardized weightings cn the observed varieU:>les which were used 
to create latent predictor variate values, similar to "factor 
scores" r for the subsequent logistic regression analysis, Kim, 
Nie, and Verba (1977) recommended f actor-^analyzing tetrachoric 
correlations calculated from threshold values obtained from a 
2-parameter IRT model, instead of phi correlations calculated 
directly from raw dichotomous data* This procedure was advocated 
on the conditions that (a) the underlying factors can be assumed to 
have normal distributions^ and (b) each observed binary variable 
can be conceived to be the result of dichotomizing potentially 
continuous underlying variables- The authors argued that many 
practical factor analysis problems met these assumptions^ since "a 
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normally distributed continuous variable can exhibit a variety of 
observed forms", including skewness (p. 59). 

Muth6n (1984) proposed a structural equation model with a 
generalized measurement part allowing observed variables to take on 
dichotomous, ordered categorical {e.g. Likert) and/or continuous 
values. The model required an assiimption of distributional 
normality for the latent continuous "\ ariables prestimed to underlie 
the observed dichotomous or categorical indicators / even though 
Muth6n "does not believe that underlying normality is always the 
most appropriate specification". Dichotomous variable analysis 
using Muth^n' s GLS model involved serious drawbacks: (a) problems 
inherent in phi and tetrachoric correlation assumptions and 
interpretations; (b) a model "is identified if and only if its 
parameters are identified" in all thv^.et parts (p. 118); and (c) the 
use of maximum likelihood estimates of sample threshold values. 
This procedure was inferior to a response modeling approach, as 
apparently, Muth6n later realized (1989) . Takane and de Leeuw 
(1987) formally proved that the marginal probabilities of 
dichotomous variate values obtained from the 2-parameter IRT model 
and from factor analysis were equivalent. They however used a 
factor-analytic procedure of the type described by Muth^n (1984) . 
Muth6n (1989) later attempted to solve some of the estimation and 
specification problems of factor analyzing dichotomous data. He 
recognized that both sides of the factor model must be continuous, 
and that a response model was needed for the continuous latent 
variable, "the binomial distribution of which is described by 
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probabilities" (p. 21) . Muth^ns' error lay in choosing a 
2 -parameter IRT model to derive continuous probability values for 
x' , and in using rt,t correlations to measure the associations in 
x' . In justifying his use of tetrachoric correlations, Muth6n 
(1989) stated: 

Because the Ix' ] variables are continuous and unlimited, 
[tetrachoricj] are the proper correlations to analyze. It is 
also well known that the phi coefficients are attenuated 
relative to the tetrachoric cojTrelations (p. 22) . 

Muth^n neglected to mention that tetrachorics are designed for 
observed dichotomoui variables with underlying assumed continuous 
normal distributions. If the continuous variables are observable, 
either directly or as the result of a response model, it is 
pointless to use tetrachorics when there is a universally accepted 
measure of association designed specifically for such continuous, 
interval-level data, the Pearson product -moment correlation. 

The search for an adequate direct measure of correlation among 
dichotomous varicibles is doomed in the context of factor analysis 
because there is a fundamental error of specification motivating 
such research. The factor model expresses a linear relationship, 
but the regression of a dichotomous variable on a continuous 
variable is illogical (Muth^n, 1989, pp. 20-21) . The factor model 
works only when Xj^ is continuous. 

A synthesis of the research findings indicates specific 
premises concerning the latent variable analysis of dichotomous 
item- response data. First, latent variable aralysis requires a 
better measure of association than either phi or tetrachoric 
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correlations. Secondr dichotomous item responses should be 
converted to continuous interval- level data before they are 
subjected to latent variable analysis. Third, the inter- 
relationships among such item data can be measured with Pearson 
product -moment correlations which are better-suited to latent 
variable analysis than either phi or tetrachoric correlations. 

Classical test theory unfortunately offers no explanation about 
how examinees at different ability levels perform on test items. 
The Rasch model, as a latent trait theory, however does permit 
estimation of the influence of ability on item performance (Crocker 
& Algina, 1986) . The Rasch logistic function provides transformed 
score values that indicate equal-interval locations along a latent 
linear ability continuum (Wright & Stone, 1979) . The advantages of 
the Rasch model over other psychometric measurement models are 
(Wright, 1977) : 

(1) The Rasch model's assumption of independent estimation of 
ability and difficulty yields score calibrations that are both 
seutnple-free and test- free. 

(2) Parameter estimates in the Rasch model are unbiased, 
consistent, efficient, and sufficient. 

(3) The Rasch model is the only mathematical formulation for 
the ogive-shaped response curve that allows independent 
estimation of ability and difficulty. 

The problem addressed in the study therefore is a comparison of 
three methods of factor-analyzing dichotomous ly-scored item 
response data. This comparison was accomplished by using phi and 
tetrachoric correlations among dichotomous data; and Pearson 
product -moment correlations among Rasch probability estimates of 
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Rasch-based Factor Analysis 8 
the same dichotomous data in factor analysis. This methodology 
involved <a) Rascn procedures to convert dichotomous item response 
data to continuous, interval-level logit values for each item, (b) 
computation of Pearson product -moment correlations among item logit 
values, and then (c) use of Pearson correlations among items in 
factor analysis. This method was compared to prior approaches 
using phi and tetrachoric correlations among items in factor 
analysis . 

METHODS AND PROCEDURES 

The latent variable representation for factor analysis is 
represented in equation [l] : 

where: X ~ vector of q observed variables 

A, = a (q X n) matrix of factor loadings of on ^ 
^ = a vector of n common factors (latent variables) 
5 = a vector of q errors in measuring unique 
factors; residuals. 
Since the factors B, and the residuals 6 are assumed to be 
continuous, interval- level random variables, the observed variables 
Xi roust abide by the same assumptions in order for the factor model 
to apply (Muth^n, 1989, p. 20) . It is also assumed in this study 
that 5 is uncorrelated with £ (J5reskog & Sorbom, 1986, p. 1.6), 
and that q > n (Long, 1983, p. 22) . 

Since the dependent variables x^ are observed and the 
independent common factors ^ are not, the parameters contained in 
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A, and 5 from equation [1] cannot be directly estimated by 
regressing on %, as in regression analysis. Instead, indirect 
estimation is achieved by decomposing the population covariances 
among the variables according to the covariance equation [2] : 

in which: 

Z = (q X q) matrix of covariances among q observed variables 

= (q X n) loadings of x^ on ^ / as in Equation [1] 
0 = (n X n) matrix of covariances among n common factors ^ 
A' = (n X q) transpose of A,, 

®5 = (q X q) matrix of covariances among q unique factors 6 
(Long, 1983, pp. 25, 33^34). 

If the variables are standardized to mean value zero and unit 
variance^ then S ^ O , and B consist of population correlations. 
In this study^ all such matrices are assumed to contain 
correlations in the off-diagonal positions, and ones in the 
principal diagonals . 

From a sample of observed data, a sample correlation matrix S 
is calculated • Then S. is used to derive an estimated population 
correlation matrix such that equation [3] : 



the components of which are estimates of the population parauneters 

in equation [2] . The problem of estimation in confirmatory factor 

analysis is finding values for the elements of A,, <l> and 0 that 
the predicted Z is as close as possible to the observed S. 
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Rasch-based Factor Analysis 10 
<Long, 1983, p. 35) . The purpose of estimation is to infer 
population parameter values that best explain the observed 
relationships among sample data. 

The Rasch analytic procedure uses a dichotomous persona by 
items re^i^onse matrix to calculate estimates of person abilities 
(b,) and item difficulties {dj . These '•^?l-vies are then used to 
estintate the probability £ that person v will give a correct 
response {k^^ =1) to item ii 

p{x^, = 1 lb,, d,} = exp(b, - d,) / II + exp (b, - d^) ] 

(Wright & Stone, 1979, p. 15) . The Rasch model is used to 
calculate continuous probability values and thereby solve the 
problems of ep-Lxmation and specification in the factor analysis of 
dichotomous data. 

The relationship between the Rasch and factor model is 

X' = X ^ + 6 , 

where. x'= p{x^i - 1 I b^/ d^} 

= exp{b^ - di)/[l + exp(b, - dj J , 

in which x' is continuous and x^^ is dichotomous. 
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Two raw score data sets were generated and analysed separately. 
One data set reflected normally-distributed latent traits and the 
other reflected uniformly-distributed latent traits. Twenty 
"observed" dichotomous variables Xi were constructed, consisting of 
two subsets of 10 variables. Each subset was hypothesized to load 
on one of two factors, and F2, and assumed to be mutually 
orthogonal <uncorrelated) . Each had a unique error factor 5,. 
A graphical representation of this factor analysis model is shown 
in Figure 1 . 



Insert Figure 1 Here 



By subjecting the 20 vectors to Rasch analysis, continuous 
response probability values ware obtained. These values were then 
substituted for the dichotc»mous values of x^, and the resulting 
continuous variable x'^ replaced x^ in the factor model in Figure 
1. 

The 20 dichotomous x^ vectors were constructed by letting each 
subset of 10 x^ variables represent a subtest of 10 items with 
incr«»Mental difficulties. Thus, there were 11 possible scores on 
each s\ibtest, from zero correct to 10 correct. The 10 x^ vectors 
were arranged in columns with difficulty increasing from left to 
right, from x^ to x^o and from x^ to Xjo, then the most likely 
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Rasch-based Factor Analysis 12 
response pattern for any raw score was a Guttman pattern (Douglas 
& Wright, 1990) . It can be shown that for any raw score 
attaineOsle by two or more distinct response patterns, the 
second-most- likely pattern can be reproduced by transposing the 
Guttman (1,0) sequential response pair into the anti-Guttman pair 
(0,1) as indicated in Table 1. 



Insert Table 1 Here 



The np values in Table 1 are the coefficients from the expansion of 
the symmetrical binomial (p + q)", in which p is the probability of 
a correct answer, q is the probability of an incorrect answer (q = 
1 - p) , and the exponent n represents the number of test items, in 
this case 10. The coefficients np were taken from Pascal's 
triangle (Ferguson, 1981, p. 91), and are numerically equivalent to 
the expected relative frequencies of occurrence of the score values 
r if the scores are distributed normally (ibid., p. 89). Thus, to 
simulate a normal distribution of ability levels, the np values 
from Table 1 were used as frequencies for the corresponding score 
levels r. A more manageable sample size of n = 340 was attained by 
dividing each np by three. For the uniform atbility distribution, 
each score occurred with equal frequency. A sample size of n = 321 
resulted from 36 persons at each of nine score values. 

Rasch analysis discards persons with raw scores of 0 % (r = 0) 
and 100 % (r = 10) . Therefore, only scores of r = 1 to r 9 were 
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represented in the data sets. Each 10-item subset represented a 
unidimensional scale on which a latent trait or ability could be 
measured. It was assumed that a siibset score r,^ varied directly 
with ability level b^; that is, the greater a person^ s ability 
level r the more items he or she would answer correctly. Misfitting 
or unexpected responses are always present in empirical 
measurement. The data sets contained artificially replicated 
misfit by letting 2/3 of the response patterns in each score group 
be Guttman patterns, and the remaining 1/3 be next -most-likely 
patterns. The normal and uniform distributions are summarized in 
Table 2. 



Insert TaJ:>le 2 Here 



To ensure that the two latent ability factors underlying each 
20-item test were uncorrelated with each other, the case nximbers, 
which uniquely identify the hypothetical persons, were arranged 
sequentially in one 10-item subset and randomly in the other* 
These assignments were made separately for the normal and for the 
uniform data sets. 

Data Analysis 

The generated data sets were analyzed as follows: 

(1) The dichotomous data were subjected to Rasch analysis, 
from which probability values p^ were obtained, expressing 
the likelihood that a person gives a correct response to an 
item. 
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(2) Three correlation matrices were calculated for each data 
set, which consisted of (a) phi = ^ and <b) tetrachoric = r^.t 
for the raw dichotomous data, and (c) Pearson = r for the 
Rasch values . 

(3) Each correlation matrix was subjected to confirmatory 
factor analysis (Long^ 1983) . Each matrix was presumed to 
manifest T::he same underlying factor model, as shown in Figure 
1. 



The above analyses were evaluated by comparing each on: 



(1) The maximum internal correlation p'*^ (Joe & Mendoza, 
1989) , which is "a measure of total dependence among a set of 
variables . . [and] is an upper bound to product moment 
correlations" (p. 220). The sample estimate of p'*' is: 

r(*) = (Xi - \,) / {X, + \) r 

where \^ and \, are, respectively, the largest and smallest 
eigenvalues of the sample correlation matrix (ibid., p. 212). 
The data sets in the present study were constructed to reflect 
maximum internal correlations in the hypothetical population 
equal to p'*> = 1.00. Therefore, the values of r{^*) calculated 
for the phi, tetrachoric, and Pearson product -moment 
correlation matrices can be used to con^are the three 
correlation methods, since each r(*) should be equal to unity, 
minus the effects of contrived measurement error (caused by 
misfitting response patterns, held constant for all three 
correlation matrices calculated from a given data set) . 

(2) Since p**' is also "an upper bound to the product of the 
two largest factor loadings (ibid., p. 220)", this product was 
compared to the criterion value of p**' = 1.00 and to obtained 
values of r(*) for each data set. 

\3) The proportion of variance in the input variables 
accounted for by the two coiranon factors F^ and Fj is equal to: 

I hiVL , 

where "L h^^ is the sum of the sqpaared communalities for all 
input variables, and L is the number of input variables 
(Norusis, 1988, p. B-46) . This proportion was compared to a 
hypothesized population value of 1.0 for each data set, as a 
means of evaluating the factor solutions. 
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Procedureib 

SPSS/PC-i- (SPSS Inc., 1988) was used to transform the 10-item 
generated data sets into ZO^-item sets, each containing two 
uncor^elated lO-itexn siibsets- The 20-item normal and uniform data 
sets were then each subjected to separate Rasch analyses, using 
MSCALE software <Davis & Wright, 1988; Wright & Stone, 1979) . 
MSCALE output included a difficulty calibration for each item, and 
an ability calibration for each raw score. These calibrations were 
used to compute probabilities (p,J in the normal and uniform data 
sets . 

Eigenvalues for the phi, tetrachoric, and Rasch-^r correlation 
matrices derived from each of the data sets were conqputed. To 
extract eigenvalues^ a principal components method was used, rather 
than unweighted least-squares, since: 



**In the CSPSS/PC+] principal components sol' Lion, all initial 
communalities are listed as I's. In all other solutions, the 
initial estimate of the communality of a variable is the 
multiple R^, . . These initial communalities are used in the 
estimation algorithm." (Norusis, 1988, pp, 50-51), 

For eigenvalue computations, it was important to use l^s as initial 

commxinality estimates, because: 



''When unities are placed in the principal diagonal of R 
[correlation matrix] then usually m = n [factors or principal 
components * original variables! • If some numbers less than 
unities (estimates of communalities) are placed in the 
diagonal, and the positive semi-definite [GramianJ property 
of R is preserved, then m will usually be less than n, and 
all [eigenvalues] will be real and nonnegative. However, 
the reduced correlation matrix R (i*e., with comraunality 
estimates in the diagonal) will not be positive semi-definite 
in practice, and both positive and negative [eigenvalues] 
may be expected." (Harman, 1976, p, 141) 
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LISREL software (Joreskog & SorbiSm, 1986) was used for 
confirmatory unweighted least-squares factor analysis cf the 
correlation matrices. 

RESULTS 

Of the three factor-analytic methods, the Rasch-based 
procedure rendered pattern matrices most similar to the criterion 
matrix given normal ability distributions (Table 3) . A pattern 
matrix contains regression weights for the common factors and a 
structure matrix contains correlations between the factors and the 
observed variables, but both are similar with standardized 
variables (Kim & Mueller, 1978, p. 84) . The Rasch-r pattern matrix 
contained 20 significant correlations (one-tailed p < .05), as 
specified by the criterion structure; the other two approaches each 
contained 16. Table 4 indicates the superiority of the Rasch-based 
method on two of the three criterion measures described earlier. 



Insert Tables 3 & 4 Here 



The Rasch-r pattern matrix indicated the strongest overall 
similarity to the criterion matrix, but the tetrachoric method 
yielded the highest loadings on six items for the uniform ability 
distributions {Table 5) . However, the tetrachoric method failed 
to produce significant loadings for two items (numbers 1 and 11); 
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Rasch-based Factor Analysis 17 
the other methods estimated significant loadings for all 20 items. 
Table 6 indicates thr Rasch-r method's superiority on the I h^Vl. 
criterion. The tetrachoric method produced the highest r^r^ 
value, and the r(*) measures were inconclusive. 



Insert Tables 5 & 6 Here 



CONCLUSIONS 

Rasch-based factor analysis is a viable approach when using 
dichotomously-scored item response data- Results indicate that 
Pearson correlations of Rasch estimates performed better than those 
obtained through phi and tetrachoric correlations among dichotomous 
data. The Rasch procedure was chosen in favor of other 
psychometric measui -sment models because only the Rasch model meets 
the latent trait analysis assumption that a linear ability 
continuum underlies dichotomous item response data. This new 
approach should enable researchers to incorporate results from a 
variety of dichotomous variables into complex latent variable 
models which can be analyzed by techniques available in LISREL 
(JdresJcog & Sorbom, 1986) . 
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Rasch-based Factor Analysis 18 
The results of this study however are qualified by the 
following irregularities in the findings: 

(1) For the r^^r^^a criterion measure in uniform 
distributions, the tetrachoric value exceeded the Rasch-r 
value. The difference however was relatively small (.9930 vs. 
. 9264) . 

<2) Exact r<*) values could not be obtained for several of the 
data sets, because of negative eigenvalues. 

Analyzing P-values versus Ability-Difficulty Differences 

Since Rasch probability values (p^J are restricted to a range 
of zero to unity, it may seem more appropriate to factor-analyze 
the differences between Rasch-calibrated pexson abilities and item 
difficulties (b^ - dj , which is a theoretically unbounded quantity. 
However, several considerations led to the decision to use p,i 
values rather than <b, - di) differences as input variables in the 
present study . 

First, even though <b^ - d^) is theoretically unbounded, in 
practice differences outside the range of -2 to +2 occur in only 
about 1% of observations (Smith, 1988, pp. 660 & 662) . 

Second, it has been assumed in this study that an observed 
item response evaluated at either zero or unity represents a 
dichotomized mamifestation of some continuous unobservable 
variable. The most logical conceptualization of this unobserveQule 
variable across all possible varieties of cognitive abilities and 
objectively-scored measuring instruments is the latent probability 
of giving a correct response to an item. Since probedsilities of 
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less than zero or greater than unity are not defined, it is 
unreasonable to expect a quantitative probability variable to stray 
outside these bounds. 

Third, the use of p^^ preserves consistent scaling across 
latent variables. Replacing dichotomous responses with (b^ - di) 
values would produce response measurements differing in scale 
across latent variables. Such scale variation could have 
significant substantive effects on the results obtained from 
covariance-matrix-based applications of a scale-dependent factor 
analytic method such as ULS (Long, 1983, p. 79) . 

Fourth, examinees with perfect scores on a latent ability 
scale <r = 100% or r = 0%) are eliminated from the Rasch analysis 
of that scale. Using a p^^-based method, these examinees can be 
included in sxibsequent factor analysis, because their observed 
responses of unity or zero are, appropriately, at the extremes of 
the p^i value range. But there can be no (b^ - d^) values for 
examinees with perfect scores, and their original observed 
responses are incompatible with (b^ - dj scaling. Therefore, these 
examinees would be excluded from all phases of (b^ - d^) based 
factor analysis. 

Finally, there are precedents in the psychometric literature 
for replacing dichotomous values with probabilities prior to factor 
analysis. Muth6n (1989, p. 21) argued that: 
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"The y* variable may be thought of as the specific tendency 
to report a certain symptom [y] . . . . The relationship 
between y and y* leads to a nonlinear relationship between y 
and la common factor F] , expressing not the value of y but 
the probability of y as a function of [F] . This is 
appropriate because what is needed is a response model for a 
discrete y, the binomial distribution of which is described 
by probabilities." 



On the other hand, the literature contains no examples or 
arguments in favor of siabjecting (b, - d^) values to factor 
analysis. There may, however, be some advantages to a factor- 
analytic method based on (b^ ~ dj , which could be profitably 
explored in subsequent research. 

Despite the benefits expected to be derived from further study 
of the new Rasch-r procedure, with a variety of data sets and 
perhaps with fitting fiinctions other than ULS, and despite the 
limited explanations provided for specific irregularities, the 
procedure i-as clear advantages for researchers who use confirmatory 
factor analysis with dichotomous item response data. Therefore, 
its use is strongly recommended in such contexts. 
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Figure 1: Factor model with 20 observed dichotomous x^^ items 



Legend 

6i = unique factors; measurement errors in 

Xi = observed variables^ dichotomously scored (1 = rights 
0 = wrong) 

= loadings of variables x^ on common factors 
= common factors 
4^12 " common factor correlation 



Rasch-based Factor Analysis 

Table 1: Probable Dichotomous Response Patterns for 11 Score 

Values 
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Legend 

r = raw score » # of itoms answered correctly 
np = # of possible unique response patterns yielding a 
given r. These values are based on the binomial 
expansion {p + q)*", where n = 10, 
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Table 2: Expected Response Patterns at Nine Score Values for 
Normal and Uniform Ability Distributions 
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raw score 


= # 


of 


items ^ 


answered correctly. 




np = 


# of possible 


unique response patterns. 


given 


r . 


fN = 


frequency 


of 


this 


score 


in a 


normal distribution. 


fU = 


frequency 


of 


this 


score 


in a 


uniform distribution. 



fNl = frequency of Guttman response pattern for this score 
value in a normal distribution. 

fUl - frequency of Guttman response pattern for this score 
value in a uniform distribution. 

fN2 frequency of 2nd-most-lilcely response pattern in a 
normal distribution. 

fU2 = frequency of 2nd-raost-likely response pattern in a 
uniform distribution. 
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Table 3: Pattern Matrix Produced by Three Methods of Factor- 
Analyzing Normal Ability Distribution Data Set 
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Table 4: 



Method 



Rasch-based Factor Analysis 

Comparison of Three Methods of Factor-Analyzing 
Normal Ability Distribution Data Set 



•tmt. 



Rasch-r Criterion 



r(*) 



Z htV20 



Legend 
r(*) 

Z hJ/2Q 



.7150 
.3709 
.1812 



>1.000* 
.7370 
.3821 



>1.000* 
.7939 
.5369 



1.0000 
1,0000 
1.0000 



= estimate of maximiim internal correlation 

= product of two largest factor structure loadings 

= proportion of observed variance accounted for 
by the two common factors 



Note: The presence of negative eigenvalues prevents the 
calculation of an exact r(*) value. 




Rasch-based Factor Analysis 



Table 5: Pattern Matrix Produced by Three Methods of Factor 
Analyzing Uniform Ability Distribution Data Sat 
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TeUble 6: Coittparison of Three Methods of Factor-Analyzing 
Uniform Ability Distribution Data Set 



Method 



tmt 



Rasch-r Criterion 



I hiV20 



.8970 
. 6815 
.3807 



>1.000* 
.9930 
. 6562 



>1.000* 
.9264 
.7205 
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1.0000 
1.0000 



Legend 
r(*) 



I hiV20 



= estimate of maximum internal correlation 

= product of two largest factor structure loadings 

= proportion of observed variance accounted for 
by the two common factors 



Note: The presence of negative eigenvalues prevents the 
calculation of an exact r(*) value. 
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